Adding 12 select genomic sequences with known structure to v0.1.1

Running BSMAP


robertsmac:bsmap-2.42 sr320$ ./bsmap -a /Volumes/Bay3/Software/bismark_v0.6.4/filtered_Unlabeled_NoIndex_L003_R1_trimmed.fastq -d  /Volumes/Bay3\ scratch/tmp/cgigas_alpha_v012.fa -o /Volumes/Bay3\ scratch//tmp/BSMAP_output_trimmed_v0_1_2.sam -p 1


NOTE- might want to check sequences for adaptors?


OUTPUT
http://aquacul4.fish.washington.edu/~steven/filefish/BSMAP_output_trimmed_v0_1_2.sam





Will need to run 
methratio script

7.1 methratio.py
python script to extract methylation ratios from BSMAP mapping results. Require python 2.X.
For human genome, methratio.py needs ~26GB memory. 
For systems with limited memory, user can set the -c/--chr option to process specified chromosomes only,
and combine results for all chromosomes afterwards.

Usage: python methratio.py [options] BSMAP_MAPPING_FILES

Options:
  -h, --help            show this help message and exit
  -o FILE, --out=FILE   output file name. (required)
  -d FILE, --ref=FILE   reference genome fasta file. (required)
  -c CHR, --chr=CHR     process only specified chromosomes. [default: all]
                        example: --chr=chr1,chr2 (this uses ~4.5GB compared with ~26GB for the whole genome)
  -s PATH, --sam-path=PATH
                        path to samtools. [default: none]
  -u, --unique          process only unique mappings/pairs.
  -p, --pair            process only properly paired mappings.
  -z, --zero-meth       report loci with zero methylation ratios.
  -q, --quiet           don't print progress on stderr.

Output format: tab delimited txt file with the following columns:
    1) chromorome
    2) coordinate (1-based)
    3) strand
    4) sequence context (2nt upstream to 2nt downstream in Watson strand direction)
    5) methylation ratio
    6) number of reads covering this locus
    7) number of unconverted Cs in the reads at this locus

Example:
     python methratio.py --chr=chr1,chr2 --ref=hg19.fa --out=methratio.txt rrbsmap_sample*.sam
    python methratio.py -d mm9.fa -o meth.txt -p bsmap_sample1.bsp bsmap_sample2.sam bsmap_sample3.bam

Note: For overlapping paired hits, nucleotides in the overlapped part should be counted only once instead of twice.
methratio.py can correctly handle such cases for SAM format output, but for BSP format it will still be counted twice,
because the BSP format does not contain mapping information of the mate.


code
python methratio.py -d /Volumes/Bay3\ scratch/tmp/cgigas_alpha_v012.fa -o /Volumes/Bay3\ scratch/tmp/OUT_methratioBSMAP_v012.txt -s /Volumes/Bay3/Software/samtools /Volumes/Bay3\ scratch/tmp/BSMAP_output_trimmed_v0_1_2.sam


-
README.txt


ERROR
robertsmac:bsmap-2.42 sr320$ python methratio.py -d /Volumes/Bay3\ scratch/tmp/cgigas_alpha_v012.fa -o /Volumes/Bay3\ scratch/tmp/OUT_methratioBSMAP_v012.txt -s /Volumes/Bay3/Software/samtools /Volumes/Bay3\ scratch/tmp/BSMAP_output_trimmed_v0_1_2.sam
@ Tue Mar  6 08:55:20 2012: reading reference /Volumes/Bay3 scratch/tmp/cgigas_alpha_v012.fa ...
@ Tue Mar  6 08:55:39 2012: reading /Volumes/Bay3 scratch/tmp/BSMAP_output_trimmed_v0_1_2.sam ...
[samopen] no @SQ lines in the header.
[main_samview] random alignment retrieval only works for indexed BAM files.
@ Tue Mar  6 08:55:41 2012: writing /Volumes/Bay3 scratch/tmp/OUT_methratioBSMAP_v012.txt ...
@ Tue Mar  6 08:56:42 2012: Done!



--




Lets try placing ref and sam file in local directory with code now …

python methratio.py -d cgigas_alpha_v012.fa -o OUT_methratioBSMAP_v012.txt -s /Volumes/Bay3/Software/samtools BSMAP_output_trimmed_v0_1_2.sam


OUTPUT
http://aquacul4.fish.washington.edu/~steven/filefish/OUT_methratioBSMAP_v012.txt (~500MB)


other

python methratio.py -d cgigas_alpha_v012.fa -z -o OUTz_methratioBSMAP_v012.txt -s /Volumes/Bay3/Software/samtools BSMAP_output_trimmed_v0_1_2.sam

OUTPUT
http://aquacul4.fish.washington.edu/~steven/filefish/OUTz_methratioBSMAP_v012.txt (~2.6GB)


python methratio.py -d cgigas_alpha_v012.fa -z  -u -o OUTzu_methratioBSMAP_v012.txt -s /Volumes/Bay3/Software/samtools BSMAP_output_trimmed_v0_1_2.sam

OUTPUT
http://aquacul4.fish.washington.edu/~steven/filefish/OUTzu_methratioBSMAP_v012.txt  (~2.0GB)




Other format
-
./bsmap -a /Volumes/Bay3/Software/bismark_v0.6.4/filtered_Unlabeled_NoIndex_L003_R1_trimmed.fastq -d  /Volumes/Bay3\ scratch/tmp/cgigas_alpha_v012.fa -o /Volumes/Bay3\ scratch//tmp/BSMAP_output_trimmed_v0_1_2.bsp -p 1

OUTPUT
http://aquacul4.fish.washington.edu/~steven/filefish/BSMAP_output_trimmed_v0_1_2.bsp (~17GB)




from methratio filtered file to contain strand +, CG only, coverage >10
Galaxy102-[Filter_on_data_101].tabular